Bilingual Unknown Word Alignment Tool for English-Thai
نویسندگان
چکیده
This paper presents a bilingual, English and Thai, unknown word alignment tools by using techniques, which are based on global and local characteristics of each word in parallel texts. Distribution and location of words in texts are analyzed generating candidate Thai unknown words with respect to each of English unknown word. Overall accuracy of the unknown word alignment is 90.32% on 6,000 bilingual English-Thai corpora. However, the average 4.5 candidate Thai unknown words per one English unknown word can greatly reduce the time if linguists do the same work manually.
منابع مشابه
A Framework of 2-step Bilingual Alignment for SMT: in Case Study of Thai-English Translation
This paper presents a framework of a new word alignment process that can be used in an SMT development. The method was designed to include the quality of using dictionary as prior knowledge and the ability of co-occurrence to fill unknown words. The alignment method is split into two separated steps: firstly, the dictionary-based step to guarantee the accurate wordaligning and secondly, co-occu...
متن کاملAn Integrated Tool for Translation-Memory Maintenance
This paper presents an integrated tool to construct and maintain translation-memory for memory-based machine translation. This tool was aimed to automate constructing and validating translation-memory both in word and in phrase levels from English-Thai parallel texts. To align English-Thai words and phrases, the crucial problems that must be resolved include multiple-word-expression boundary am...
متن کاملImprovement of Statistical Machine Translation using Charater-Based Segmentationwith Monolingual and Bilingual Information
We present a novel segmentation approach for Phrase-Based Statistical Machine Translation (PB-SMT) to languages where word boundaries are not obviously marked by using both monolingual and bilingual information and demonstrate that (1) unsegmented corpus is able to provide the nearly identical result compares to manually segmented corpus in PB-SMT task when a good heuristic character clustering...
متن کاملCreating a Reusable English-Chinese Parallel Corpus for Bilingual Dictionary Construction
This paper first describes an experiment to construct an English-Chinese parallel corpus, then applying the Uplug word alignment tool on the corpus and finally produce and evaluate an English-Chinese word list. The Stockholm English-Chinese Parallel Corpus (SEC) was created by downloading English-Chinese parallel corpora from a Chinese web site containing law texts that have been manually trans...
متن کاملConstraints on Tone Sensitivity in Novel Word Learning by Monolingual and Bilingual Infants: Tone Properties Are More Influential than Tone Familiarity
This study compared tone sensitivity in monolingual and bilingual infants in a novel word learning task. Tone language learning infants (Experiment 1, Mandarin monolingual; Experiment 2, Mandarin-English bilingual) were tested with Mandarin (native) or Thai (non-native) lexical tone pairs which contrasted static vs. dynamic (high vs. rising) tones or dynamic vs. dynamic (rising vs. falling) ton...
متن کامل